A Neural Networks Committee for the Contextual Bandit Problem

نویسندگان

Robin Allesiardo

Raphaël Féraud

Djallel Bouneffouf

چکیده

This paper presents a new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards. Several neural networks are trained to modelize the value of rewards knowing the context. Two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons. The proposed algorithms are successfully tested on a large dataset with and without stationarity of rewards.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Contextual Multi-armed Bandits for the Prevention of Spam in VoIP Networks

In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual one and an algorithmical one. The conceptual contribution is to formulate – as an example – the real-world problem of preventing SPIT (Spam in VoIP networks), which is currently...

متن کامل

ar X iv : 1 20 1 . 61 81 v 2 [ cs . N I ] 1 2 Ju l 2 01 2 Contextual Multi - armed Bandits for the Prevention of Spam in VoIP Networks Technical Report

متن کامل

A committee machine approach for predicting permeability from well log data: a case study from a heterogeneous carbonate reservoir, Balal oil Field, Persian Gulf

Permeability prediction problem has been examined using several methods such as empirical formulas, regression analysis and intelligent systems especially neural networks and fuzzy logic. This study proposes an improved and novel model for predicting permeability from conventional well log data. The methodology is integration of empirical formulas, multiple regression and neuro-fuzzy in a commi...

متن کامل

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model wi...

متن کامل

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

This paper proposes GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation. The algorithm is based on two innovations. Firstly, we present a temporal-difference based method for learning the gradient of the value-function. Secondly, we present the deviator-actor-critic (DAC) model, which comprises three neural networks that estimate the val...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

A Neural Networks Committee for the Contextual Bandit Problem

نویسندگان

چکیده

منابع مشابه

Contextual Multi-armed Bandits for the Prevention of Spam in VoIP Networks

ar X iv : 1 20 1 . 61 81 v 2 [ cs . N I ] 1 2 Ju l 2 01 2 Contextual Multi - armed Bandits for the Prevention of Spam in VoIP Networks Technical Report

A committee machine approach for predicting permeability from well log data: a case study from a heterogeneous carbonate reservoir, Balal oil Field, Persian Gulf

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

عنوان ژورنال:

اشتراک گذاری